14 research outputs found

    Incremental Dead State Detection in Logarithmic Time

    Full text link
    Identifying live and dead states in an abstract transition system is a recurring problem in formal verification; for example, it arises in our recent work on efficiently deciding regex constraints in SMT. However, state-of-the-art graph algorithms for maintaining reachability information incrementally (that is, as states are visited and before the entire state space is explored) assume that new edges can be added from any state at any time, whereas in many applications, outgoing edges are added from each state as it is explored. To formalize the latter situation, we propose guided incremental digraphs (GIDs), incremental graphs which support labeling closed states (states which will not receive further outgoing edges). Our main result is that dead state detection in GIDs is solvable in O(log⁥m)O(\log m) amortized time per edge for mm edges, improving upon O(m)O(\sqrt{m}) per edge due to Bender, Fineman, Gilbert, and Tarjan (BFGT) for general incremental directed graphs. We introduce two algorithms for GIDs: one establishing the logarithmic time bound, and a second algorithm to explore a lazy heuristics-based approach. To enable an apples-to-apples experimental comparison, we implemented both algorithms, two simpler baselines, and the state-of-the-art BFGT baseline using a common directed graph interface in Rust. Our evaluation shows 110110-530530x speedups over BFGT for the largest input graphs over a range of graph classes, random graphs, and graphs arising from regex benchmarks.Comment: 22 pages + reference

    Automata-Based Stream Processing

    Get PDF
    We propose an automata-theoretic framework for modularly expressing computations on streams of data. With weighted automata as a starting point, we identify three key features that are useful for an automaton model for stream processing: expressing the regular decomposition of streams whose data items are elements of a complex type (e.g., tuple of values), allowing the hierarchical nesting of several different kinds of aggregations, and specifying modularly the parallel execution and combination of various subcomputations. The combination of these features leads to subtle efficiency considerations that concern the interaction between nondeterminism, hierarchical nesting, and parallelism. We identify a syntactic restriction where the nondeterminism is unambiguous and parallel subcomputations synchronize their outputs. For automata satisfying these restrictions, we show that there is a space- and time-efficient streaming evaluation algorithm. We also prove that when these restrictions are relaxed, the evaluation problem becomes inherently computationally expensive

    Safe Programming Over Distributed Streams

    Get PDF
    The sheer scale of today\u27s data processing needs has led to a new paradigm of software systems centered around requirements for high-throughput, distributed, low-latency computation.Despite their widespread adoption, existing solutions have yet to provide a programming model with safe semantics -- and they disagree on basic design choices, in particular with their approach to parallelism. As a result, naive programmers are easily led to introduce correctness and performance bugs. This work proposes a reliable programming model for modern distributed stream processing, founded in a type system for partially ordered data streams. On top of the core type system, we propose language abstractions for working with streams -- mechanisms to build stream operators with (1) type-safe compositionality, (2) deterministic distribution, (3) run-time testing, and (4) static performance bounds. Our thesis is that viewing streams as partially ordered conveniently exposes parallelism without compromising safety or determinism. The ideas contained in this work are implemented in a series of open source software projects, including the Flumina, DiffStream, and Data Transducers libraries

    Safe Programming over Distributed Streams

    No full text
    The sheer scale of today\u27s data processing needs has led to a new paradigm of software systems centered around requirements for high-throughput, distributed, low-latency computation.Despite their widespread adoption, existing solutions have yet to provide a programming model with safe semantics -- and they disagree on basic design choices, in particular with their approach to parallelism. As a result, naive programmers are easily led to introduce correctness and performance bugs. This work proposes a reliable programming model for modern distributed stream processing, founded in a type system for partially ordered data streams. On top of the core type system, we propose language abstractions for working with streams -- mechanisms to build stream operators with (1) type-safe compositionality, (2) deterministic distribution, (3) run-time testing, and (4) static performance bounds. Our thesis is that viewing streams as partially ordered conveniently exposes parallelism without compromising safety or determinism. The ideas contained in this work are implemented in a series of open source software projects, including the Flumina, DiffStream, and Data Transducers libraries

    Context Directed Reversals on Permutations and Graphs

    No full text
    Efficient Information Processing is fundamental to activities stretching from genome maintenance to data management. This project is analyzing the nature of and unusual efficiency in sorting information, of an elaborate genome maintenance system. Single cell organisms called ciliates host an encrypted copy of their genome in a micronucleus. Their genome maintenance system often replaces the current functional genome by decrypting an encrypted copy. Decryption is performed through permutation sorting, using context directed reversals (cdr) and context directed block swaps (cds). The decryption mechanism has computational power and is programmable, giving compelling reasons to examine its mathematical properties. Generalizing several prior results, we identify the set of all signed permutations that are sortable by applications of cdr and cds. The methods used in this investigation are from the mathematical fields of algebra, combinatorics, graph theory and low dimensional topology

    Diastereoselective Hydrolysis of Branched Malonate Diesters by Porcine Liver Esterase: Synthesis of 5-Benzyl-Substituted C\u3csup\u3eα\u3c/sup\u3e-Methyl-ÎČ-proline and Catalytic Evaluation

    No full text
    Malonate diesters with highly branched side chains containing a preexisting chiral center were prepared from optically pure amino alcohols and subjected to asymmetric enzymatic hydrolysis by Porcine Liver Esterase (PLE). Recombinant PLE isoenzymes have been utilized in this work to synthesize diastereomerically enriched malonate half‐esters from enantiopure malonate diesters. The diastereomeric excess of the product half‐esters was further improved in the later steps of synthesis either by simple recrystallization or flash column chromatography. The diastereomerically enriched half‐ester was transformed into a novel 5‐substituted Cα‐methyl‐ÎČ‐proline analogue (3R,5S)‐1c, in high optical purity employing a stereoselective cyclization methodology. This ÎČ‐proline analogue was tested for activity as a catalyst of the Mannich reaction. The ÎČ‐proline analogue derived from the hydrolysis reaction by the crude PLE appeared to catalyze the Mannich reaction between an α‐imino ester and an aldehyde providing decent to good diastereoselectivities. However, the enantioselectivities in the reaction was low. The second diastereomer of the 5‐benzyl‐substituted Cα‐methyl‐ÎČ‐proline, (3S,5S)‐1c was prepared by enzymatic hydrolysis using PLE isoenzyme 3 and tested for its catalytic activity in the Mannich reaction. Amino acid, (3S,5S)‐1c catalyzed the Mannich reaction between isovaleraldehyde and an α‐imino ester yielding the “anti” selective product with an optical purity of 99 %ee
    corecore